16 Support Vector Machines

Algorithmics 2025

Area of Study 3: Computer science: past and present

Learning Intentions

Key knowledge

the concept of training algorithms using data
- the concepts of model overfitting and underfitting
- support vector machines (SVM) as margin-maximising linear classifiers, including:
  - the geometric interpretation of applying SVM binary classification to one- or two-dimensional data
  - the creation of a second feature from one-dimensional data to allow linear classification

Key skills

explain, at a high level, how data-driven algorithms can learn from data
- explain the optimisation objectives for training SVM and neural network binary classifiers
- explain how higher dimensional data can be created to allow for linear classification

Machine Learning Algorithms

A machine learning algorithm is a procedure that allows a computer to improve its performance at a task by learning from data, rather than being given only explicit, hand-coded instructions.

It takes examples (data) as input.
It uses a model to find patterns or rules in that data.
It can then make predictions or decisions on new, unseen inputs.

Machine Learning Algorithms

Traditional algorithms: every step is written out by a programmer.
Machine learning algorithms: the computer adjusts its own internal rules (parameters) automatically, based on training data.
Examples:
- Neural network – adjusts weights between “neurons” to recognise patterns.
- Support vector machine (SVM) – finds the best boundary (hyperplane) to separate categories.
The machine can adjust its own parameters but it does not create them.

Support Vector Machines (SVMs)

A support vector machine (SVM) is a supervised machine learning algorithm.
Its main purpose is classification, especially binary classification
- Email filtering (spam / not spam).
- Image recognition (cat / not cat).
- Medical diagnostics (disease / no disease).

Feature extraction or vectorisation

Features are measurable properties of the data (e.g. word counts, colours, weights).
Classification is assigning the data to a category based on those features.

Task: classify an email as spam or not spam.

Features might include:
- Count of special words (e.g. “$$$”, “win”, “free”)
- Number of links
- Length of the email
- Sender’s domain
- Feature vector: $\mathbf{x} = (3, 1, 0, 4)$

Training the SVM

Training involves comparing a large set of preclassified vectors.

The SVM looks for the best separating boundary (called a hyperplane) between the two classes of data.
It chooses the hyperplane that maximises the margin
- the margin is the distance between the hyperplane and the closest data points
- closest data points are called support vectors.

Bias and Variance in Classification

Two types of errors when classifying data:

Bias - underfitting 🎯

Analogy: arrows clustered together but far from the bullseye
Comes from a too simple model
Misses the real patterns
Leads to systematic error (underfitting)

Variance - overfitting 🎯

Analogy: arrows scattered widely around the target
Comes from a too complex model
Fits the noise as well as the signal
Leads to unreliable predictions (overfitting)

The Trade-off

High bias → underfitting
High variance → overfitting
Goal = balance → arrows tightly grouped around the bullseye

Support Vector Machines

Key Vocabulary for SVM

Support vector – the data points that are closest to the separating boundary; they determine the position of the hyperplane.
Margin – the distance between the separating hyperplane and the nearest support vectors; SVM maximises this.
Hyperplane – the boundary SVM draws to separate the classes (a line in 2D, a plane in 3D, etc.).
Bias – error caused by using a model that is too simple (underfitting).
Variance – error caused by a model that is too complex and too sensitive to training data (overfitting).

16 Support Vector Machines

Learning Intentions

Key knowledge

Key skills

Machine Learning Algorithms

Machine Learning Algorithms

Support Vector Machines (SVMs)

Feature extraction or vectorisation

Training the SVM

closest data points are called support vectors.

Bias and Variance in Classification

Bias - underfitting 🎯

Variance - overfitting 🎯

The Trade-off

Key Vocabulary for SVM